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Abstract: NonStop SQL is an implementation of ANSI SQL on Tandem 
Computer Systems. It provides distributed data and distributed execution. It can 
run on small computers and has been benchmarked at over 200 transactions per 
second on a large system. Hence, it is useable in both tiie mformauon center and m 
production environments. NonStop SQL provides high-availabihty through a 
combination of NonStop device support and transaction mechanisms. Ihe 
combination of SQL semantics and a message-based distributed operatmg system 
gives a surprising result: the message savings of a relational mterface pay for the 
extra semantics of the SQL language when compared to record-at-a-tmie mterfaces. 

This paper presents the system's design rational, and contrasts it to previous 
research prototypes and to other SQL implementations. 
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INTRODUCTION 

NonStop SQL is an implementation of ANSI SQL [ANSI]. In addition to the ease- 
of-use implicit in SQL, NonStop SQL is a high-performance, distributed SQL 
which can be used both in the information center and in production on-line 
transaction processing applications. It has the performance, integrity, 
administrative, and utility feamres required to support hundreds of transactions per 
second running against hundreds of gigabytes of database. 

Prior SQL implementations are marketed as information center tools or as 
productivity tools. Their ease-of-use is accompanied by a significant performance 
penalty. These vendors typically offer a second, non-SQL, system for production 
applications. Tandem rejected tfiis "dual database" strategy as being too expensive 
to support, and too expensive and cumbersome for customers to use. A major goal 
was to produce a system that could be used on large and small systems and in the 
information center as well as for production on-line transaction processing 
applications. 

NonStop SQL had several other design goals: 

1) To be integrated witii the Tandem networking and transaction processing 
system. 

2) To provide NonStop access to data ~ in case of a fault, only the affected 
transactions are aborted and restarted: data is never unavailable. 

3) To support modular hardware growth, and as a consequence support tens of 
processors executing hundreds of transactions per second. 

4) To allow data and execution to be distributed in a local and long-haul 
network. 

These goals are related. Tandem's existing support for networking and transactions 
gave a good basis for distributed data and execution. NonStop operation is 
Tandem's hallmark. The challenge was to integrate the SQL language with this 
preexisting Tandem system architecture. 

Just as importantly, some goals were excluded from the first release. There was 
little attempt to exploit the parallel architecture of the Tandem system to get 
parallelism withm a transaction in the style of Teradata [Teradata]; rather, 
parallelism is exploited by having multiple independent transactions executing at 
once. The implementation did not focus on solving the heterogeneous database 
problem. In addition, beyond an interactive SQL interface and a report writer, not 
much work was devoted to end-user tools like QBE or a fourth-generation 
language. Rather, work focused on the SQL engine and features to help application 
programmers build systems in conventional ways. 

Now that NonStop SQL is available, we are seriously considering projects in each 
of these neglected areas. The NonStop SQL design provides an excellent base for a 



highly paraUel SQL implementation. In addition, SQL is a natural base for data 
sharmg among heterogeneous systems, because most systems support SQL. It is 
also an excellent base for end-user and fourth generation languages. 



AN OVERVIEW OF THE TANDEM SYSTEM 

Tandem builds a single-fault-tolerant, distributed system for on-line transaction 
processing. The system can be grown in small increments by adding processors and 
disks to a cluster, and clusters to a network. 

Hardware Architecture 

The Tandem hardware architecture consists of conventional processor each with 
up to 16Mb of main memory and a 5Mb/sec burst multiplexor io channel. The 
processors do not share memory. Dual 20Mb/sec local networks can connect up to 
16 processors. A fiber-optic extender allows four-plexed 1 Mb/sec interconnect of 
up to 224 processors (see Figure 1). 

Communication and disk device controllers are dual ported and attach to two 
processors so that there are dual paths to the storage and communication media. 

Disk modules come in two styles ~ low-cost-per-actuator and low-cost-per- 
niegabyte. These modules are packaged 2, 4, 6, or 8 to a cabinet Typically, each 
disk is duplexed so that media and electronic failures of a disk will not interrupt 
service. 

Operating System and Network 

Processes are the software analog of hardware modules. They are the unit of 
functionality, failure, and growth. Messages are used to communicate among 
processes. Shared memory communication is avoided because it gives poor fault 
containment and because it limits the ability of processes to reside anywhere in the 
network. The operating system kernel provides processes, process pairs, and a 
rehable datagram service among nodes in the cluster [Bartlett]. A privileged layer 
of software uses these datagrams to provide a session-oriented message system 
among processes in the cluster, and transparently extends the message system to a 
long haul net based on leased lines, X.25, SNA, and other protocols. 

Above the message system, everything looks like a process. A device is a process, a 
file is a process and a running program is a process. An application OPENs a 
process by name. Then the apphcation operates on the object with procedure calls 
m conventional Cobol style. The underlying message system turns these into 
remote procedure calls. Typical operations are READ and WRITE, but different 
object types support a variety of other operations. 




Figure 1. A schematic of a distributed Tandem system showing three sites. Two of 
the sites are large clusters of 32 processors and associated disks Unked by a high 
speed local network. One site is a smaU two-processor system. The sites are linked 
via a public or leased network. 



Objects are named by "site.process.directory.object". Security is checked at open 
(and purge, rename, secure,...) by the message system to see that the requester has 
access to the site, and by the process to see that the requester has access to the object. 
A conventional access-control list scheme is used to control security [Safeguard] It 
optionally provides logging of accesses or access violations. 

Data Management 

The majority of applications built on Tandem systems are programmed in Cobol 
Most of the data management tools, generically caUed Encompass, are built to 
support that development style. Encompass is built around the concept of an 



application dictionary which holds the definitions of all data structures used hv 
programs, files, reports, and display screens [Pathmaker]. '^^^'"'^^ "^^^ ^y 

File creation may be driven off this dictionary. 

^nSmn 9^0^^ "■* ^y^'«"'..'=alled Enscribe. is of particular interest here because 
Nonstop SQL co-ex.sts with it and has a similar design Enscribe snnnort! 
unstructured (Unix like) files which are used to store p^gramr^d text Tal^o 
supports three Icmds of structured files: key-sequenced (B-t^) reMve (diri« 

Spi^ro^^Sctr ^'"^■^ ""'^ '' ="* ^"^ ^'"^ «""^^^- 

ke"v m^if '™, w '' T/ ^ ? •"' •'°°''' ^»"S "'^ks of the netwotk based on 
dfv^,S^!ff i J honzontal partitioning is transparent to the application m 
div sion of labor in file management is instructive (see Fieure 2) Rarh fnio^^n; 7e 
a file has a label descn;bing the ffle. When the file sy'emTcUen ) o^nsfe tl^I 
disk process (server) returns this descriptor. Based on infomation in tWs 
on ,h7fif . ff^y^^-" *=° opens all related partitions of the fifea^d aU Sd cc 
on the fie and their partitions. When a read or write request is nresenteS fn Tml 
system, it uses the record key to decide which disk proTss c^Sc^e * J^f 

Ssk p^"s mZlr.Th'""^ "" "^ ^''^' *^ ^'^'y^'^" *"-' ^nd?a readme 
aisK process managmg the appropriate mdex and, based on the index record -^endl 

t^r-i,%*' ^PP™P.™te base-file disk proce s. Simila^S system is 
responsible for issuing inserts and deletes on altemate indices wh^ iSordslTi,^ 
serted, updated or deleted in the base table. The disk s^eHs i^oSbk f^^ 

SterSo"fiSr.'^T°'*'°^'^^''°¥'"^- "="^S<=-^^^^^^ 

used S pages ^' '"^''S^'"^"' "^ ^ '""in memory cache of recently 

^o^ff""?' ^t'"* servers provide fault tolerance by executing as NonSton 
process pairs which tolerate single faults of media, paths, Ld processors The di^k 
process supports a DO/UNDO/REDO protocol for^ansacZ pretect°Jn 
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Figure 2. The division of labor in Enscribe between the application process, the file 
system, and the disk server processes. The file system runs as a protected subsystem 
of the application process. Disk servers run in a processor connected to the disk 
they manage. In general, the file system communicates with many disk processes 
and the correspondmg disks. This figure can be compared to Figure 6 which shows 
the division of labor in the NonStop SQL system. 



Transaction Management 

Files may be designated as audited (transaction protected), either when they are 
created or at a later time. Updates to audited files are automatically protected by 
locks belonging to the transaction and by undo and redo log records. 

An application program can issue BeginTransaction, which allocates a unique 
transaction identifier. This identifier is automatically included in all messages sent 
by the application and its servers. All updates to audited files generate log records 
and locks tagged with this transaction identifier. Eventually, the application calls 
EndTransaction, which commits the transaction if possible, or it calls 
AbortTransaction, which undoes the audited updates of the transaction [Borr]. 
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The underlying mechanism provides transaction back out, distributed transactions 
with a commit protocol implemented with the grouped, two-phase, and presumed- 
abort protocols [Mohan]. A single transaction log (audit trail) is maintained at each 
site, rhis audit trail provides undo, redo, and media recovery for both old 
(Enscnbe) and new (SQL) data. 

The Tandem system tolerates any single fault without interrupting service If there 
IS a double fault or an extended power outage, the transaction recovery system 
recovers data by aborting all uncommitted transactions and redoing recently 
committed transactions. In case of double media failure, the transaction manager 
supports media recovery from an archive copy of the data and the transaction log 
by redomg recently committed transactions. Archive dumps of the database can be 
capmred while the database is being updated - that is, media recovery can work 
from a fuzzy dump. 

Why SQL? 

Perhaps the most controversial decision of the NonStop SQL project was to 
abandon compatibiUty with Tandem's existing data base products and adopt an SOL 
mterface mstead After aU, Encompass was the first commercial distributed 
database system. It has many strong features and a loyal following. 

In retrospect, the choice of SQL seems less courageous since SQL has become the 
standard data management interface. At the time, the rationale for adopting SOL 
was that the Encompass dictionary is passive and proprietary. Encompass provided 
a record-at-a-time interface for programmers and little data independence. Like 
most such systems, it was built on top of the file and security system, rather than 
being mtegrated with it Customers were asking for an integrated and active 
dictionary - one which had no "back-doors" and one which assured consistency 
between die dictionary and applications. In addition, customers were asking for 
support of views and mtegrity constraints. SQL provides views and a standard data 
definition and manipulation language. NonStop SQL provides views, integrity 
constraints, and an active, distributed and integrated dictionary. Tandem is 

iLtn^bTedTnVQl^"^^^^^^ ^^^'^^'^^ '^^^^^^"^^^^ ^^^^^^^^^^ '^ ^ 

After settling on SQL, the build-vs-buy decision had to be made. Several software 
houses were willing to port their SQL systems to Tandem hardware. This 
alternative offered a low-cost and low-risk solution. It also offered low-benefit 
Tandem wanted SQL to be integrated, fault-tolerant, high-performance, and 
distobuted. So, a second courageous decision was made: to start from scratch. 
Fortunately, several Tandem developers and managers had experience on other 
SQL implementations. This considerably reduced the risk of a new 
implementation. 



Nonstop SQL LANGUAGE FEATURES 

The NonStop SQL language is based on SQL as documented in the System R papers 
[Astrahan], the SQL/DS manuals [SQL/DS], the DB2 manuals [DB2], and the ANSI 
SQL definition [ANSI], Extensions and variations were added to support 
distributed data, high-performance, operational interfaces, and integration with 
the Tandem system. When development began, the ANSI SQL standard [ANSI] did 
not exist. Fortunately, only minor changes were required to achieve close 
compliance with die ANSI standard. 

Naming 

The first chore was to decide how naming, directories, and security should work. 
Standard SQL naming has the form "user.table". This is inadequate for a 
distributed system with local autonomy. In such a system, user names are qualified 
by site name and table names need to be more complex so that names may be created 
without consulting any other site [Lindsay]. 

NonStop SQL names objects lilce any other objects in the Tandem system. That 
naming convention is "site.process.directory .object". These names are used to 
name tables, indices, views, and programs. Naming of columns follows the ANSI 
SQL conventions. Integrity constraints are named by ANSI SQL identifiers so that 
diagnostic messages can explain which constraint is violated. 

Considerable care was taken to make catalogs and naming automatic. The Tandem 
default naming works for SQL objects. Programs, tables, views, and all other 
system objects are named in the same way. The concept of logical and physical 
schema is almost invisible (automatic) because table names and their corresponding 
file names are the same. The goal was that most SQL examples from Date [Date] 
should work without change when entered from a termmal. 

Having one naming convention for the whole Tandem system simplifies learning 
and operating the system. 

Logical Names for Location Independence 

System administrators and application designers need the ability to bind a program 
to new tables without altering the source code. In production systems, a program is 
typically created and tested in a test environment and then moved to a production 
environment. In a distributed system a program may be duplicated at many 
different sites. Running a report against many instances of a generic table is 
another common situation. With most SQL systems, each of these situations require 
editing the program and changing the table names to reference the desired tables. 
Imbedding literal file names, or any kind of literals, in programs or reports is a 
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?o?i" R 'T' ^^L?^ statements plus Cobol FD statements solved this problem in 
iy()4. But most SQL implementations reintroduce the problem. 

Nonstop SQL offers logical names, caUed "defines", which allow users to rebind a 
programs table names at SQL compile or run time without altering the source 
^""^^A^; S-t ^""^ ^^""'^^"^ "" function to OS/360 DD cards, TSO ALLOCATES 
attributes ^^^'' ^^^ associate a logical name with a physical name and its 

Defines are used for many other purposes within the Tandem system. For example 
they may carry label tape attributes. The define statement is supported by all 

Tn^lf^A'^'TT^'^''^^ ''*^^'^^"' ' P'°""'' ^^y programmatically interrogate 
and alter its defines. This programmatic interface differentiates defines from Aeir 
counterparts m other systems. 

Logical names are used as foUows by NonStop SQL. Consider the simple example 

SELECT dept 

FROM emp 

WHERE empno = :empno 

7.^}' '^^^^iT^ ^ ^ ^^'^ environment, then 'emp' will be bound to a particular 
table. When the program is moved to a production environment, it should run with 
toe^^mp bound to the production 'emp' table. In NonStop SQL this is done by 

SELECT dept 

^M =emp - the "=" implies a logical name 

wrtHKb empno = :empno 

^ n'^TS^""^ '' ^''"^£^'' ^ particular table or view by issuing a define like: 
UiiWJNE =emp, FILE site.process.test.emp 

ctogi^TdSt^' *"'" ""''• " "'^ "" '"°'"' '° p"''""'°" "y '''"'Ply 

DEFINE =emp, FILE site.process.production.emp 

When a compiled SQL program is invoked with a different define for "=emp" the 
new define ovemdes the compUe-time define and causes temporary recompHation 
of that statement. Process creation propagates defines so that dieXd process has 
the same nammg envu-onment (context) as its parent. 

?hf ™r'^? "^ f^' mechanism for both data definition and data manipulation. 
They work uniformly m the conversational and programmatic environment Thev 
are supported as a standard part of the Tandem system naming mechanism and so 
provide a general tool that works for aU files, tables, devices and praces^ 



Dictionary and Catalogs 

The descriptive information representing the logical schema is kept in an SQL 
database called the dictionary. Information about a table is replicated at every site 
having a fragment of the table, so that the local parts of the table can be accessed 
even if the site is disconnected from the network. This design rule, called local 
autonomy, implies that the dictionary be partitioned into catalogs - each catalog 
acting as a local dictionary for tables at that site. The transaction mechanism is used 
to protect updates to catalogs and so maintain the consistency among catalogs at 
different sites. A simplified diagram of the catalogs is shown in Figure 3. 

A site can have a single catalog, a catalog per project or application, or a catalog per 
individual. It all depends on the organization's management. The CREATE 

CATALOG <directory> command creates the catalog tables described by Figure 3 
in the designated directory. The command also registers the catalog in a site 
directory so that aU catalogs may easily be found. 

When an SQL object is created, it is registered in the dictionary. Creation 
commands can explicitly specify a catalog; otherwise, the default catalog of the 
executing process is used. For example, die table creation command: 
CREATE TABLE sitel,diskl,dir,emp (emp_no INTEGER, 

dep_no INTEGER, 

PRIMARY KEY emp_no ) 

(PARTITION site2.disk2,dir.emp FIRST KEY 10000 
CATALOG site2.disk3,du-4); 
creates a table partitioned across two sites of the network. Local autonomy requires 
that the definition be replicated in catalogs at each participating site. So in the 
example above, the second partition is registered in a catalog at site2. Executing 
this create statement makes entries in the process" default local catalog and also in 
the explicit remote catalog named site2,disk3.dir. Compiled versions of the table 
descriptions are stored in the file labels as part of the fde manager's disk directory. 
All necessary information about a table can be read from the file label as part of the 
file system's OPEN step. Consequently, the catalogs are only examined at SQL 
compile time. This pomt will be amplified later. A program using this table from 
site3 will be registered in a catalog at site3. The program's usage of the table will 
be recorded in the USAGES table at sitel and site3. Figure 3 gives a schematic of 
the catalogs. 
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used catalog 
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Figure 3. A schematic of the catalog tables. 
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The catalog tables shown in Figure 3 have the following semantics 

* TABLES has the name, owner, and security of each table or view. 

* VIEWS has the view definition stored as a character string. 

* COLUMNS has a description of each column of each table or view. 

* BASETABS has information that applies to tables but not to views. 

* CONSTRAINTS has the text of integrity constraints attached to tables. 

* FILES describes the physical files that support tables and indices. 

* PARTITIONS gives name and catalog of partitions of any table in this catalog. 
INDEXES describes an index file, its key and whether it is unique. 

* KEYS describes the permutation of the index columns to the base table columns. 

* PROGRAMS describes programs which contain SQL. It tells when each 
program was last compiled, and whether the compilation has since been 
invalidated by changes to the database. 

* COMMENTS stores user provided descriptive text about objects. 

* USAGES is a concordance which shows who uses what. It is used to generate 
where-used reports and to direct program invalidation when a relevant table or 
view definition changes, and to relate indices and views to their base tables. 

An additional table not shown in the figure gives the version of the current catalog. 

Unifying Logical and Physical DDL 

The SQL language integrates data definition and data manipulation in one language. 
This is a nice simplification when compared to the DBTG separation of data 
definition fi-om data manipulation [DBTG]. NonStop SQL continues this trend by 
merging the definition of logical schema (tables) with the definition of physical 
schema (files). It hides the logical-physical distinction from the user, but the 
underlying implementation makes a clear distinction between logical and physical 
attributes. To give a specific example, a table named BMP (short names are used 
here) with several keys, comments and integrity constraints along with other table 
attributes is stored in the dictionary describing the logical schema. In addition, the 
table content is represented by a physical file named EMP which may be partitioned 
across multiple disks and each key is represented by a separate index file. One way 
of saying this for the DB2 user is that NonStop SQL eliminates the concept of 
DBSPACE [DB2] or individual DATABASES in Ingres [Ingres] or Informix 
[Informix]. 

In NonStop SQI,, when a table is created the user can specify its physical attributes. 
Unspecified attributes are defaulted. -To be compatible with ANSI SQL, all 
physical attributes have defaults. For example the default organization is indexed, 
and the default block size is 4k bytes. The naive user can be completely unaware of 
the logical-physical distinction. In addition, this distinction is only visible at data 
definition time. Data manipulation statements are completely insulated fi-om the 
physical attributes of files; they operate on logical tables. 
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LOGICAL SCHEMA 




PHYSICAL SCHEMA 




Figure 4. NonStop SQL implements a two schema architecture. The logical 

Logical Table Attributes 

fi MQn ^ ^?^ implements all the logical attributes of Level 2 ANSI SOL DDL 
[ANSI] wiA the exception of reals, nulls, authorization, and use:.. TTie onSsiSi of 
Z^ir^\^'^^ i"" ^^!f' compelling arguments against nuUs and fTdefaulJ 
[Date]; but. now that nulls are part of the standard, NonStop SQL will likely s™ 
Aem m a subsequent release. Defaults are implemented as%i^ified iT^sTso^^^ 
Each type has a default value which the user can override. ^ ^ "^^^ ^^^' 

Users may add integrity constraints to a table. Constraints are named single 
SfiST'^ T^e constraint is validated against the table when the co^Sis 
Stln r The^^^ft^^ ^y insert or update which violates the constraint will be 
rejected Constramts are enforced by the disk process (file server). This removes 

^7m"vT5:iderS dr "!? r^'^'' P^^^^- ^^^ -nstiin^ax^Trd 
and may be added or dropped at any tmie, NonStop SQL's unplementation is 
sbghtly more general than the ANSI SQL definition of <check constLin^^ANSQ 

It t!^^^ ^^hj'T^S^ ^^' °^ INTEGER. NUMERIC, and DECIMAL up to 64 
b^ts are supported. Fixed and variable length character strings are supported 
FLOAT IS not supported in the first release. ^upportea. 

13 



Records cannot span blocks and so are limited to 4K bytes in the initial release. 
Indices involving multiple fields, ascending and descending, may be defined on 
tables. 

Comments on all objects are supported. Referential integrity will be implemented 
when the ANSI proposal is approved. 

Physical Table Attributes 

Tables and indices can be horizontally partitioned in the network. Each partition 
resides on a particular disk volume. The partitioning criteria are user-specified 
low-key values. New partitions can be added to the end of a table. Although 
partitioning tables in a network is much discussed, NonStop SQL is the first SQL 
system to offer partitionmg of tables across multiple sites of a networic with full 
transaction recovery. 

For performance, various file organizations are supported. The standard file 
organization is a B-tree based on the table's primary key. Sophisticated users can 
specify a relative file (directly addressable records), or an entry sequence file 
(insert only at the end, records cannot grow once inserted). If the user does not 
specify an organization or key when the table is created, then the default is a key 
sequenced table with the GMT juhan timestamp of each insert used as the system 
generated key record key. System generated keys for relative, entry sequenced, 
and default files appear in the table as a column named S YSKEY. 

By default, tables are covered by transaction protection (logging and locking), but 
this is a user option. Each table has a default lock granularity which can be set or 
changed. The default is record granularity, but the designer can request generic 
locking on a key range. As mentioned later, these defaults can be overridden by a 
program or statement. 

Views 

As the R* designers pointed out, implementing views in a distributed system with 
local autonomy is not straightforward. The ideas of "shorthand view" and 
"protection view" were borrowed from the R* project [Williams]. On closer 
inspection, only the names were borrowed. In NonStop SQL, any view which is a 
simple projection and selection of a single table may be declared to be a protection 
view. A protection view inherits the organization, indices and partitioning of its 
underlymg table. In addition, protection views are updateable. To simplify 
authorization, only the table owner can defme (and own) a protection view. The 
accessor is authorized to the view. 

Protection views are implemented by the SQL kemel and consequently really do 
provide protection - one can grant access to the view without granting access to the 
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underlying tables. This differs from other implementations which place a security 
mechanism on top of the operating system protection mechanism. There are no 
back-doors to this protection mechanism. The protection kernel knows about 
tables and views and there is no "lower-level" access to the "underlying" files 
This is one of the benefits of integrating SQL with the operating system. 

By contrast, a shorthand view is simply a macro definition. When the view name 
IS invoked, the view definition is substimted and the authorization and qu^ 
optmiization is as though the macro body had been entered directly. So the user of a 
t'^'l^t ""-^"^ """'' ^^ authorized to its underlying protection views and tables 
Shorthand views are very general. They allow combinations of tables and views via 
projection, selection, joms and aggregates; but, they do not provide protection and 
are not updateable. They can be used to denomialize the database or subset ir^ 



convement ways. 
Data Manipulation 



Nonstop SQL supports Uvel 2 ANSI DML with the exceptions of unions, nuUs 
and the partial backout of failed set operations. The data manipulation iiiterface 
was extended m the areas of naming, concurrency control, and dynamic SOL As 
ah-eady mentioned, naming was extended to work in the Tandem style of network 
nammg and of logical table names. INSERT, SELECT, UPDATC DELETE 
subquenes, aggregates, correlation, group by, functions, and cursors, ire provided 
as m the standard UMON is not implemented in the first release. D^ SQL 
IS supported m the style of SQL/DS or DB2 with some imiovations to handle 

.tr^fT7-^ ""P^i ""^ ""^"' ^^^^^^^^" ^^ ^^J°^ D^ innovations were b Ae 
areas of locking and consistency. 

As in most other SQL implementations, NonStop SQL requires cursors used for 
update to be declared "FOR UPDATE ON <f,eld Ll>". TOs allows SqI to avoW 
aie Halloween Problem (see page 37). In addition it tells the SQL executor to g Jt^ 
exclusive lock and avoid the most common fonn of deadlock ^ two procLTefrea^ 
s'^nd^rf" '^ " "'"^'" " ''°''P°'- Unfortunately, this is not paS of fte SQL 

All update operations on transactional files automatically acquire exclusive locks 
held to end of transaction (degree 1 consistency is automatic) For seSns fte 
programmer is given the option of dirty data (BROWSE ACCESS! cnWnr 
stability (STABLE ACCESS), repeatable reads (REPEAT^LEACCTSS) ZZ 

Sfj ^^Z f Sranulanty and mode. These options are an extension to 
the SELECT statement syntax. Cursor stabUity is the default 



IS 



Rather than clutter SQL syntax with Tandem extensions, the CONTROL TABLE 
verb was added. This verb allows the programmer/user to modify some of the 
defaults associated with all tables or with a particular table. For example, to 
change the timeout to one second on all requests, the user may execute: 

CONTROL TABLE * TIMEOUT 1 SECOND; 
When encountered by the compiler, this statement affects all subsequent statements 
untU another CONTROL TABLE is encountered. When executed dynamically, this 
statement overrides dynamic compile-time settings and timeout values for all 
subsequent dynamic SQL statements. 

In the first release, the CONTROL TABLE command allows a program to specify 
the lock granularity, lock protocol, and style of lock waits. Later releases will have 
additional options. 

This design is in contrast to other SQL systems which associate control with the 
transaction or program rather than with the statement. Finer granularity control 
on a statement-by-statement basis or table basis is essential for tuning high- 
performance applications. 

Degree 3 consistency was sacrificed in one case to avoid hotspots. When inserting 
at the end of an entry sequenced or relative file, NonStop SQL locks the inserted 
record but does not lock the end-of-file. This gives only degree 2 consistency 
[Gray], since insert order may differ from commit order unless a table lock is 
acquired. Locking the end of the table (file) for the duration of a transaction is 
equivalent to a table lock for sequential files ~ a well known bottleneck in high- 
performance systems. IMS/FastPath solves this problem by deferring sequential 
inserts until transaction commit [Gawlick]. The FastPath design has more complex 
semantics and also sacrifices degree 3 consistency since the records do not appear in 
the file when they are inserted. 

Transaction Management and Concurrency Control 

Tandem's Encompass data management system provides a transaction mechanism 
that includes transaction back out and distributed transactions. Nonstop SQL is 
mtegrated with this transaction manager [Borr], A single transaction log (audit 
traU) is maintained at each site. This log provides undo, redo, and media recovery 
for old (Enscribe) and new (SQL) data. A single transaction can contain both 
Enscribe and SQL calls, and is recorded in a single log per site. Having one trans- 
action log per site eases system management when compared with the non- 
integrated log-per-subsystem of other designs. 

When a table is created, it is audited by default. Updates to audited tables are 
automatically protected by locks belonging to the transaction and by undo and redo 
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log records. The user may declare a table to be non-audited by altering the audit 
attribute. 

Any program locking or updating an audited table must be part of a transaction 
oT^^rxl''.?.^^?^ ^^^^ ^' ^^ servicing a request from some process that did a 
BEGIN WORK). As explained earlier, the operating system message kernel 
manages the propagation of transaction identifiers as part of the requester-server 
(remote procedure call) flow within the transaction. 

The locking features include file, generic, and record granularities, automatic 
escalation to coarser granularity, implicit or explicit shared and exclusive lock 
^^nl'rJ^^f^^^"^^^^ °^ consistency, selectable on a per statement basis, and a 
LUCK 1 ABLE verb. Deadlock detection is via timeout. The default timeout is 

sixty seconds. 

TTie CONTROL TABLE verb aUows the user to specify the handling of lock waits 
The user may request "bounce" locks which never wait for a busy resource- rather 

m^n^^ frn^f^^'^'u^u^^^'^''''' ]^°^^'°P ^^^ '"PP^^ read-through locks 
^rVxrroru ^ A^???^' ^^^^ ^^^^ ^^^"^g ^irty data (uncommitted updates). The 
cumKUL 1 ABLE command aUows a program to alter these attributes on a table- 
by-table or statement-by-statement basis. 

Non-audited objects complicated the design, since non-transactional lockmg is 
explicit and ha^complex semantics. In particular, the state of locks and cursors 
after a COMMIT WORK verb is difficult to explain. NonStop SQL defined som^ 
rules which seem consistent. A FREE RESOURCES verb was introduced to free 
locks and cursors on non-audited tables. The COMMIT WORK and ROLLBACK 
WORK verbs implicitly issue a FREE RESOURCES verb. For simplicity, design 
experts recommend against the use of non-audited tables for anything but 
temporary results. 

Non-audited tables are useful for batch applications using the old-master new- 
master recovery protocol. A table may be made temporarily un-audited during a 
batch run. Non-audited tables are used as scratch files for sort, and the query 
evaluator. In addition, to reduce log activity and the consequent large log files 
indices creation and the table load utUity run non-audited and then alter the table to 
be audited. 

Local Autonomy 

Local autonomy requires that NonStop SQL offer access to local data even if part of 
It IS unavailable and even if the site is isolated from the rest of the network For 
compiled SQL plans this means that the SQL compiler must automatically pick a 
new plan if a chosen access path (i.e. index) becomes unavailable This 
recompilation is automatic and transparent to the user 
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Data definition operations pose more difficult problems. If a table is partitioned 
among several nodes of a network, then dropping the table requires work at each of 
those nodes -- both updates to the catalogs and deletion of the files. Changing table 
attributes has similar requirements. In general NonStop SQL requires that all 
nodes related to a table participate in the DDL operation. If any relevant node, 
catalog, or disk server is unavailable then the DDL operation is denied. This 
violates local autonomy. There are no attractive alternatives here. NonStop SQL 
takes the following approach. The local owner can DISCONNECT a table to break 
all linkages of the table with unavailable sites. This is a unilateral operation. 
Thereafter, the user can operate on the table without complaint. At a later time the 
user can run a utility which attempts to CONNECT the pieces of a disconnected 
table back together. 

Conversational Interface 

Almost all SQL systems come with a program which executes any SQL statement 
entered from a terminal. This is an easy program to write since most SQL 
implementations provide dynamic SQL: a facility to PREPARE, DESCRIBE and 
EXECUTE any SQL statement (much like EVAL in LISP). 

Systems differ in the bells and whistles they provide with this basic facility. The 
NonStop SQL conversational interface (SQLCI) includes extensive documentation 
via on-lme help text, online diagnostic messages, and it includes an iterative report 
writer. 

The NonStop SQL manuals are written so that they may be accessed by the con- 
versational interface. The user can ask for help on any topic in the manual or for 
explanation of an error message. For example, HELP CREATE TABLE will 
display the syntax, semantics, and examples from the manual. This unbedded 
documentation is very convenient. 

The report writer features included headings, footings, breaks, data display 
formats for numbers and dates, wmdowing, margins, folding, and if-then-else 
controls on data display. Report writer commands are separated from the SQL 
commands. The answer set is defined with SQL and a report defined with the 
report writer syntax, then the report is displayed. The report definition is iterative; 
the report format can be altered and then the report regenerated. 

Dynamic SQL can be exercised from the conversational interface. Parameters can 
be included in SQL statements. Statements can be PREPARED and EXECUTED. 
Parameters can be varied after the prepare and before the execution of the prepared 
(compiled) statement 
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WORTorToLLB^^^^^^ T'^f'' ^'^ ^^^^ ^^^^ ^"^ COMMIT 

•/^ or KXJLLBACK WORK. If no transaction is specified, the conversational 
interface starts and ends one for each SQL statement. versauonal 

In addition, the conversational interface can execute program scriots give, 
complete statistics on the cost (cpu, records examined, records affected, messages 

errnrT^r'^m^l^^^"""''' P'"^"^"^ ^^^^'^^^ ^^^g^^^^^^ "^"sages in case of 
er^oi, EXPLAINS the query execution plan chosen by the SQL compiler and has 
mterfacestothefoUowingutUities: ""^pucr, ananas 

* Convert to and from "old" Enscribe files. 

* DupHcate and copy sets of objects 

* Display table definitions 

* Describe sets of files and tables 

* Update the statistics for a table 

* Load and unload data for interchange or reorganization 
Verify and correct dictionary consistency 

! S^^^^^^i? "where-used" reports on programs, views, and tables 

* Text editor 

Last but not least, the interface is always hstening to the terminal so the user can 
intemipt die queiy execution at any time and cancel the transaction Sumri^^^^^^ 

^n?,hr A ^^^'"^ ?' ^^^ conversational interfaces. It certainly cauTed Tome 
trouble. A dedicated process was introduced just to listen to the temimal 

Programmatic SQL 

Nonstop SQL had four goals: ease-of-use, distribution, high availabihty and high- 
performance. Smce the SQL language is non-procedural, thTK hfgh 

^rn^T^^'f ''.f^'i compilation of the SQL statemems and^efficiem 
cormnunicationofdatab^^^^ 

hn^M'^f^'^ t* l^\T'^' [ANSI-21 SQL statements am imbedded in die 
host language, bracketed by EXEC SQL and END-EXEC keywords ZiSOL 

^c7^TZ .f Tnr*' Program text and produces a host LgTge program 

addt on "^^ ^^^ '''''"^'f '"P^^^^^ ^y ^^"^ '^ ^^ SQL exfcufon^ S 
addition the preprocessor produces a file containing the SQL statements Th^ 

prtrL 'SfeTor' ""f ^' '^ ^ ^^^^^^^ «- to^ prodtelL^b^c 
program. The SQL compiler transforms SQL statements to a set of execution 
plans, one for each SQL statement in the source program (see fig^ 5A). 
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Figure 5A: CompQation of a Cobol85 plus SQL program. Cobol85 and SQL source 
is separated by SQL preprocessor which produces a Cobol85 source program and a 
object file containing the SQL source. The Cobol85 compiler feeds the object 
program to the binder. The binder combines this with the SQL source sections. 
The SQL compiler reads these source sections and the SQL dictionary to produce an 
SQL object program which is included in the "whole" object file and registered in a 
catalog. This object file is executable by the hardware and SQL executor. 

Host Language Features 

NonStop SQL's programmatic interface has many features to ease programming, 
including comprehensive diagnostics imbedded in output listings, ability to invoke 
data declarations of tables from the catalogs, support for WHENEVER (exception 
handling), support for multiple levels of copy libraries, and generation of tracing 
information so that application programmers can trace errors back to source 
language statements. 
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In addition, NonStop SQL supports separate compilation. Cursors may be spUt 
across compilations. That is, a cursor may be defined in one compilation and used 
m another. Program "B" can use cursor "C" of program "A" by referring to 
cursor A.C". ^ 

Only a Cobol85 preprocessor is provided in the beta release. Pascal C and Tal 
language support is being developed . ' ' 

Integrating SQL Programs With Object Programs 

Tandem's implementation of program compilation is similar to the original System 
R implementation [Astrahan]. An innovation lies in the binding of the SQL source 
and object programs with the Cobol85 (or other host language) object modules 
1 he resulting object program is a single object which can be moved, copied 
archived, or purged without having to manipulate one or more separate "access' 
modules . By contrast, most other SQL systems store the SQL program in the 
A^n DROTPRO^C^^^^ "^"^^''^ ^'""^'^ ^^ unavailable to user commands other 

The Tandem binder (aka link editor) was modified to support SQL source program 
sections and SQL object sections and the dependency between the object program 
and its SQL sections. The binder combines code sections, data sections, symbol 
table sections (for the symbolic debugger), and other types of sections from 
various compilations to produce an executable object program file. The SOL 
preprocessor produces an SQL source section for each program unit which Se 
Cobol85 compiler teUs the binder to include as a section of the Cobol85 object 
program. The user can bind several object programs together and present the 
result to the SQL compiler. The buider automaticaUy includes the SQL source 
sections used by the object programs into a new object file. The SQL compiler 
takes any object program as input, extracts all the SQL source sections from the 
object program and chooses a plan for each statement. The combmed plans are 
stored m an SQL object section of the object program (see Figure 5A). 

ff an SQL statement needs to be recompiled, the SQL compiler reads it from the 
^QL source section of the executing object program file. 

This design aUows programs to be archived and moved without accessing the SOL 
source. It greatly simplifies the management of SQL programs. The recurring 
Aeme here is that close integration of SQL with standard system tools has 
considerable benefits in simplicity and functionality. 
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Static and Dynamic Compilation 



An SQL program must be compiled and registered in a catalog before it may be 
nm. This step is called static compilation and is part of the macro step of SQL + 
Cobol85 compilation. 

If the program is moved or renamed, the new program will not be registered in a 
catalog and so it must be statically SQL compiled before it may be used. 

Once compiled, an SQL program will be automatically recompiled when the 
database changes or the plans become invalid. Adding or dropping an index to a 
table, updating statistics on a table, adding columns to a table or changing some 
attributes of the table are examples of changes which can invalidate a program. 

If a program is invoked with logical names that override the compile time logical 
names, it will be recompiled. In addition, if a needed access path is inaccessible 
because the network or media is down, then the program will be recompiled to 
work with the available data. 

As the name suggests, automatic recompilation is transparent to the application 
program. It happens when the invalid statement is first invoked. A statement may 
become invalid during the program execution. If so, the statement is recompiled 
on the fly as part of the user's transaction and then executed with the new plan. 

Run Time Statistics and Error Reporting 

The NonStop SQL programmatic interface has two control blocks which feed 
information back to the application program. Detailed diagnostics are provided in 
the SQL Communications Area (SQLCA) in case of error. Any SQL error is 
reported by an error code along with up to seven diagnostic messages. A routine is 
provided to format these messages in a specified national language and display them 
to a diagnostic stream or to a buffer. 

In addition, the SQL executor remms detailed statistics in the SQL statistics area 
(SQLSA). These include a table-by-table count of records used, records examined, 
disk reads, messages, message bytes, waits, time-outs, escalations, and busy times. 
Designers can either instrument their programs by reading these counters or they 
can use the operating system's measurement facility [Measure] to read these SQL 
counters along with other system counters: 
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IMPLEMENTATION 



The Nonstop SQL system and its utilities are about .7 million lines of code 
.mp emented by up to 25 develope.^ over 3 years. This was the s xS SOL 
jmplementation for one developer. Five others had done it at least once before 
Many of the developers had implemented other relational systems. We t^ed no° to 
repeat our earlier mistakes. 

Implenienting SQL itself took only a few people; most of the effort went into 
integration with the rest of the system, and into utilities. The quality assurance 

rnl^^T""' '"'"''°''' ''?^"™«"»'io". ""arketing. education, and betl test efforts 
mvolved many more people in the product. «=»icrions 

^^.jT'cn/'- '"'^S'?!^'' r * ** "P""""? ^y"*™- When the system is up 
Nonstop SQL is up. One does not bring up an SQL database; it is just there TOs 
contrasts with most other designs. In addition, tecause the opemtag system^d 
SQL authorization are integrated, there is no "logon" to SQLfone logs on to fte 
system. Once on the system, the entire network pravides a siigle system iTage. 

Cohols'^^lf.InT^^^ ^''^^^'^' ^"'"' " convei^ational interface or they can write 
ser^f in'SwavTr ,"' "^^ *'• "''"• '^''' P^^nuns can beLtalled as 
aSSn [S^].'''"'''"" ""^^""S ^y'"^'" '° P™"''^ ^ high-perfonnance 

Figure 6 shows the structure of an executing appUcation program. The program 

l^fvolfft '?f,T'"' """^ ' ^""^ '*^"" (2^^ ''y")- '^^ Persia's SQL statements 
TwranJlJenf ''''"'"' ' '"' ""'^^'^ ™'"^^' ^^'^ ™° ^ the application^ 

The executor has its own data space. The executor manages query execution hv 
managmg logical names, collecting records from various tablesll^^ fte ffl™ ystem 

S»?f .S •!• ''^ ^y^'*" *'* smgle-variable requests. The executor sends 

m,l«fT°" 'l''"""!' '" ^ '^P^'« P™'^^^'' ("<" ^hown in Figure 6) which 
manages the catalogs. TTie catalog manager is implemented as a slame pr^ress 
for authorization reasons - only the catalog mai^ger process can'^Ste ?Sg 
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APPLICATION PROCESS 



APPLICATION 



Host Interface 
Execution of plans 
joins 

aggregates 
subcontract ddl 
subcontract compile 



PROTECTION 



Tables-> Files 
Partitions 
Indices 

Transactions 

Buffering 



DISK SERVERS 



single variable que 
project+select 



\ 

query\ 



rr. 
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ADMINISTRATION 

Authorization 

Plan invalidation checkis 

Protection views 
SQL OPERATIONS 

Select/update subset 

Check constraints 
FILE STRUCTURES 

B-tree, relative,... 

locking 

logging 
MEDIA MANAGEMENT 

space allocation 

duplexing 

main memory cache 




Figure 6A. The division of labor within a compiled and executing Cobol plus SQL 
program. The application calls the SQL executor which handles the SQL plans and 
subcontracts single variable queries to the files system and disc process. The file 
system manages the physical schema. The disc process manages subsets and 
integrity constraints on table fragments (partitions). Se also figure 6B. 
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SKf<Jn?Lt!!^?'^K^' compUed and executing program. The appUcation 
caus the SQL executor which m turn calls the fUe system. The file system sends 

n !w """ ''"'S """^T '° '"* P"^"^'- •"'« disk process doeJpmTectiom 
pro^s ^ and executor or is updated or deleted by the disk 
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The file system manages the physical schema. It handles opens of files and indices, 
partitioning, and sends requests to appropriate disk servers, and buffers the replies.' 
When a table is updated, the file system manages the updates to the base file and to 
all the secondary indices on the file. When a record is retrieved via an index, the 
file system first looks in the index and then in the base file. This is because the index 
may well be on a disk separate from the base table, so the disk process cannot do 
index maintenance in general. If a retrieval can be entirely satisfied by the index, 
the base table is not accessed - this is called semi-join by Palermo [Palermo]. 

Each disk volume is managed by a set of disk servers. These server processes have 
a common request queue and a shared buffer pool which they coordinate via 
semaphores. The disk servers implement file fragments. They manage disk space, 
access paths, locks, log records, and a main memory buffer pool of recently used 
blocks. A disk server operates on the single-variable query, scanning the database 
(usually via the primary index) to find records which satisfy the selection 
expression. Once the records are found, the disk server either operates on the 
records (update expression or delete), or the projected records are returned to the 
file system. If the request is a very long one (more than 10 ios), the disk server 
returns to the file system asking it to continue the request by reissuing it. This 
prevents the servers from being monopolized by a particular requester if other 
work is queued. 

The file system is one protection domain (it is privUeged). Each disk process is a 
separate protection domain (a process). Each disk process authorizes the 
application process to the table when the file system sends the iRle open request. An 
open to a protection view is authorized by the disk process, which checks that the 
requester has the needed authority to the view. 

The SQL Compiler 

The NonStop SQL compiler, which includes an optimizer, uses the standard tricks 
of a SQL query compiler. It picks an SQL statement execution plan which 
minimizes a cost function, combining io cost, cpu cost, and message cost. Smgle- 
variable queries are subcontracted to the disk servers storing the data partitions, but 
otherwise the entire query is executed by the application program process. ' In 
particular, all joins are computed by moving the projected and selected fragments 
to the SQL executor in the application process. The compUer has two join strategies 
(nested and merge). It sorts a table if no appropriate access path is available for a 
join. 

NonStop SQL's approach to distributed query execution is controversial Other 
systems, notably R* [Williams], devote considerable effort to optimizing the 
distributed execution of jom queries. We believe that these optimizations make 
httle sense m a local network where communication is fast and cheap, and that joins 
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m a long-haul network are infeasible for tables of credible size. Consequently not 
much effort was devoted to this problem. When the high priority feLres have 

proMem^ ' '^''' '^^''' """^ ^ ^"^"^'^ '^ ^' transcontinenS job 

The compiler composes shorthand views and treats protection views just like 
ordinary tables. It handles logical names (does name binding), and resolve! 
partially quahfied names within the process' working directoiy. ^^soives 

In the style of pB2^ the compiler generates a set of control blocks which are 
mterpreted by the SQL executor, mther than generating machine language cod^ 
It was felt that die control block interface was more maintainable and S Ae o^y 
opSed ^ '^''"''' '' '"P'"''^'^ evaluation, which is higWy 

^slTo^lte fh^^of ' *' ^^NTROL TABLE statement aUows the sophisticated 
user to give the SQL compiler hmts conceming lock granularity, and lock protocol 
SVi? '5f f to both static and dynamic SQL. With time, m^ other Sim 

Tr. -A I'^'^'^^i ^^^^"^s requestmg a paraUel sort [FastSort]. FastSort gets 
considerable speedup on large files (over 1MB) by usilig multiple processors !^d 

The unique features of the compiler are detailed below. 
Subcontracting Single-Variable Queries to Disk Processes 

The Nonstop SQL disk process wiU execute any selection, update or delete 

X rin ^*^^^^^^ ^"^.^ -- one involving only columA Xes of a skg e 
table, relational operators, and literals. For example, only a single message k 
needed to execute the command: . " uy ^ iingie message is 

UPDATE =account 

SET balance = balance * 1.07 

WHERE balance >0; 
which pays 7% interest to all positive accounts. The use of a single message to do 
^t updates deletes and. selections is the key optimization of Nonslp SqK ^^ 
be seen m the performance section, the consequent reduction of message traffic ^ 
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Sequential Block Buffering or Portals 

Data specified by a single- variable query may be returned to the SQL executor in a 
block of records radier than a record-at-a-time -- this is the concept of portals 
described by Rowe [Rowe]. Two forms of blocking are supported, "physical" in 
which a physical disk page is returned, and "virtual" in which a projection and 
selection of the data is returned. Clearly, if the single-variable query is very 
selective, virtual blocking wiU have much reduced message cost and consequendv 
be very attractive. Much of the performance advantage of NonStop SQL over its 
Tandem predecessor (Enform) derives from intelligent use of virtual buffering 
Enform had only physical buffering. Physical buffering gives a factor of three 
over the record-at-a-time interface. Virtual buffering gives NonStop SQL an 
additional factor of 3 over physical buffering on many Wisconsm benchmark 
queries [Bittonj. Initially, the compiler's use of buffering was conservative 
because it unplied course granularity (file) locking. But once the benefits of 
buffermg were quantified, a form of block locking was implemented so that almost 
aU sequential queries can use buffering without locking the entire table. 

Compilation and Local Autonomy 

Data unavaUability can invaUdate a compiled plan. For example, if the index for a 
table is unavailable because its disk or communication line is down, then a plan 
usmg that mdex will be invalid. Local autonomy requires that the SQL statement 
be recompiled and executed to use the avaUable data and access paths. The NonStop 
i>QL compiler implements this requirement by using the partial information in the 
local catalog. 

Invalidating Compiled Statements 

When the da^base design changes or the table statistics change, compiled plans 
become invalid. NonStop SQL's approach to invalidation differs from other 
systems m two ways. First. NonStop SQL invalidates plans whenever there is a 
chance diat the change may affect the plan. For example, most SQL systems do not 
automatically recompile when an mdex is added. This could confuse a user who 
added an mdex and noticed that it did not help performance at aU. NonStop SOL 
wiU automaticaUy recompile the program in this case and in any other case where 
the plan may change. 

NonStop SQL also differs in the way it invalidates programs. Most SQL systems 
mvahdate SQL compilations by clearing the valid bit of the plan in the catalog In 
such systems, whenever a program executes a transaction, it first reads the vahd bit 
from the catalog to assure that it is stiH valid and then proceeds. If the bit is off 
the program automaticaUy re-SQL-compiles itself 
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Nonstop SQL rejected this approach because the valid bit can become a botfl^n^.i. 
database may change while some program is inaccessible So NonSfnn^or 



the teSd dLtfte ope]? if f ^""^' '^^ *" P''°S"'^'^ authorization to 

causesanewOPEN of the table withamoremodenfmbTe^S^^,;;;;^^ " '"'" 

SQL recompUaloTof p "g~d fo?S ^oT'"'- ™ m"" '^ ^ •""' ^"^ ^'^""^ 
when the process first smm^uln ^'^ recomp.lat.on of the program 
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Figure 7. Plan validation and invalidation in NonStop SQL is done with timestamps. 
The plan records the redefinition timestamp of each table. When the plan opens the 
table it presents this timestamp which is matched with the timestamp in the file 
label. If the timestamps don't match, the plan is tavalid. If the table changes, the 
open is invalidated. This design reduces catalog access when compared to other 
implementations. 

Table Opens vs Cursor Opens 

The Tandem system is designed for on-line transaction processing. In such 
systems, early binding is the watchword. Programs are compiled, installed and the 
system is brought up. Thereafter, the system might run for several months without 
change or intermption of service. 

The ground rule in such systems is to do all the checking at startup so that there are 
no extra instructions in the "normal" path for the next few months. As a 
consequence, the NonStop SQL executor OPENs tables when they are first 
referenced by the application, and keeps the tables OPEN until the execution plan is 
invalid and a new OPEN with a new redefinition time is needed. Subsequent 
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references to the table by any statement of the process wiU share this single OPEN 

0p'EN'"Ti"'0PrNLr.'rh "' '''"^' ^" *^ ^^^^^' ^^^^ ^''' ^" use'l sle 
ui-tN. 1 he OPEN serves three purposes: it covers the redefinition/mvalidation 

NonStop Operation 

As the name suggests, NonStop SQL provides NonStop operation Not onlv is the 
application program protected by transaction locking iid loggtog ^addition an 
device dnvers and system processes run in NonStop mode so lat flify tokme a^v 
smgle hardware faUure and many software failures without disrupttagCrvS ^ 

^™,v!'''^™'?T "f"^" "''"°'^^ ^'^^^ ^° *at a disk failure wiU not dismm 
service. One disk can be repaired and revived while the mirror disk is operathig "^ 

If a process or processor fails, aU transactions involved in that process or orocessor 
are aborted (undone), but unrelated transactions are unaffected by the faTure 
s°=dt orU::rrs.^'*"'^^ -' communications lines) iJZ^^^ 

•Hie network hides link failures as long as an alternate path is avaUable. 

fcceS't^^f- °^ " ^''' ''" inaccessible, aU available partitions are still 
accessible. Partitions are reopened on demand when they become available 
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PERFORMANCE 

Single-variable Query Processing Performance 

cess, the disk s.rye7cTrfl^rie\lSTZ^r''°\f'^^^^^ 
records per second on a VLX pr^,4ssor ni.L It ''' " ^""^ "undred-byte 
can update a column at 600 recoTdTner ;p^nH n 7 '^°'"P^""™- ^"^ "Pdates, it 
Aese are transaction audited) '' ™" " ' similar speed (aU 

Performance on the DebitCredit Benchmark 

o'Sr:tirn:!tfdidtofrcl,rre Sf ? °f .^'"f'--^^'"^ ^--« for set 

original goal was to m t* Sfperfo^rc^ "f Z S^^t '"""'h °'''^'"''- '^' 
to within 25% when executinrfhr^f-.^^ *"' '^''°''''"^'-''-''™« interface 

transaction proces^g tvSnlliT^^^'^-'^^tl:^^^:- """''"^ 

a-time inteXce S su^nS r^„^^ *" same number of ios as the i^cord-at- 
would have a shorte paSeS ^°„°'gi?f / 1^''^^^ *at CICS plus SQL/DS 

management routines; yeCtki'^Xoi^Ltem^r' "' ""^^ ^'^' "^'^ 

s';;t^js.''rror hXTch'^Vr:^^^ -- *^^ '-"^^ ""■'^ ^ "^ -- 

that fewer statemen™^ med T^^nf^ u"^' °'°'^' '° *"■« '^ ^ POSsibiUty 
Insult that the DeWtCrel^Tacti^ fastr^'tf sorfr'^"^^ the »^ri4 
this in detail, consider the two nropr^m. r! ? ^^ *"" °° inscribe. To see 
the code for ie two apphcall^m'^f ^°°""« '"°*"' *^ ''^'^''^^^ ^^P^^''' of 

COBOL RECORD AT A TIME COBOL . SQL 

READaccountWriHLOCK llpnATc 

KEY IS act-number. !5P^^ "^"n' 

ADD delta TO accountbalance SFTtajancg ^ ,,3j^^^ ^ ^^j^^ 

REWRITE account WHERE number = act-number; 

READ teller WITH LOCK 

KEYISldlcr-num S;M^'="" 

ADD delta TO teller.balance r.SL'Si^'* = ''=J^™== + <i«Ita 

REWRITE teller, "HERE number = :teller-num; 
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READ branch WITH LOCK UPDATE branch 

KEY IS branch-number. SET balance = balance + delta 

ADD delta TO branch.balance. WHERE numbep=:branch-number, 
REWRFTE branch. 

INSERT INTO HISTORY VALUES ( 
MOVE timestamp TO history.timestamp. :timestamp, 

MOVE act-number TO history.act-number. :act-number, 

MOVE teller-number TO history.teller-number. :teller-num, ' 

MOVE delta TO history.delta. -delta)- 

WRTTE history. 

The record-at-a-time interface has seven calls - 3 reads, 3 writes, and an insert. An 
IMS or DBTG interface would have seven similar calls. But SQL is able to do the 
application in four calls - 3 updates and an insert. This economy translates into a 
three message savings on a base of seven, a 40% savings in messages! Such savings 
make SQL a wonderful interface for message based systems. 

As a consequence of these savings, the SQL cpu cost is slightly smaller than the 
record-at-a-time cost (about 4% less over all and about 10% reduction in the 
database path-length). This savings is achieved while still providing the added 
functionality of SQL. 

Tandem's smallest system, the EXTIO was benchmarked at 4TPS. The benchmark 
was then scaled up to 4, 8, 16 and 32 VLX cpus to demonstrate modular growth of 
the system and transparent data distribution between the EXT 10 and the VLX 
complex. A plot of the TPS rating of the VLX system as processors were added is 
shown in Figure 8. 

The beta release of the software (March 1987) was benchmaiked at 208 DebitCredit 
transactions per second (TPS) on a 32VLX processor cluster. At this high rate, 
each transaction does one physical read and one mirrored write on average; all 
other ios are buffered across many transactions. This configuration showed no 
bottlenecks and so could have been grown much larger. The price performance of 
NonStop SQL is competitive with the very best transaction processing systems. 

There are still many opportunities to improve performance of NonStop SQL m 
later releases. In particular, SQL lends itself to the IMS FastPath techniques of 
Field Calls and optimistic locking hidden under the SQL language interface 
[Gawlick]. 
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TRANSACTIONS PER SECOND 
vs 
PROCESSORS ® 

208 



106 



8 1ft 

CPUs 24 32 

Figure 1 . Not sho^vn in diat figureTri 2 n^^.? "^ configuration is shown in 

^^^^T:^:rXz!:i:z::^::', 7 r-'^-^- - - ^' "^ses on the 

which aUow modular grow* and loWhicrf"' f^ *""''«^'J databases 
improves peifoimance 5nd fiinctionaJ^fv .tTl^. "^^ ''"' "'^ ""^^ges and so 
had predicted that relational sySrwouM™»f"!,-'™.t ^ ^^«". Ted Codd 
[Codd]. Codd was right. " ""*' distnbuted databases possible 

Ad Hoc Query Performance 

P^lrffran^fon":^ Z .tStrBufo^? *^r-rt "-nchniarlc to test its 
available for other relational svstemr^n .^'"'^"'^raWe performance data is 
M68000 to a IBM 4381-P13ld" dTZv° "T-"*"^ ™g'"g ^onr a 
these data to NonStop SQL's pSfomSSfstcrth J^t''^ ^' I' '^'«'="" '° <=0'"Pa.« 
are quite different fmm fe oLr s^T. '' *' ^'^^^"^ ''^'<'™fe '"d software 

^:f ''.^iX:^^^^^^^^^^^ SQL and Enforni. the 

Nonstop SQL perfonns twice as 111? as i Ti ^°'=°'"P^^^- As a median, 
Tins speedup comes from better p a^s in some ca?etSr' °" ?t '™« ^'"''^'^ 
from the distributed que^- execution imStn'^^h™'' °^*''P*«''"P '=»«'«« 
quenes to the disk process. '^ ^" *" subcontracting single-variable 
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Indirect comparisons suggest that NonStop SQL does about as well as the best 
centralized relational systems when run on comparable hardware. 

Performance Observations 

The performance assurance of NonStop SQL exposed some language features 
which have inherently bad performance and which require language extension to 
correct. These are discussed in the following sections. 

The Halloween Problem 

Several SQL implementations have a famous performance bug called the Halloween 
problem bug. The Halloween problem was discovered one trick-or-treat eve, 
hence the name. It applies to queries of the form 

UPDATE payroU 

SET salary = salary +1; 

If this query uses an access path ascending on salary, then it will try to give all 
employees an infmite raise. First it will increment all the zero salary people giving 
them a salary of 1. Then it will give everyone with a salary of 1 a raise, and so on. 

As a consequence, many SQL systems refuse to use an index on a field mentioned in 
the update clause. This implementation gives very poor performance on common 
queries. Consider the query: 

UPDATE employee 

SET department = 100 

WHERE department = 55; 
the idea of not usmg the index on department number is a poor decision. It implies 
scanning the entire table. This is not acceptable for tables of credible size. In 
addition, this is not an instance of the Halloween problem. 

NonStop SQL can use an index for an update statement if: 

The value clauses (right hand side of assignments) do not involve fields of the 

index or, 
The where clause fully specifies the index key as equal to literals or host 
language variables. 
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!!bove au.'nfil",""'' ""' *^^ ~'" ™™' "^^y '=<"""°" ^ifi'^'ions. The example 
depaL'ent ' '"'" ""'' " "'""^^ "o* "^ '"ese criteria for an iZ^t 

mTuPDA^nN'Tuy'^'"^ ^°^"P'^'*• N°"Stop SQL insists that the clause 

avoid. The SQL executor refuses to update undeclared fields via such a cursor 
Group support 

^%T^"^ ^'"^"^ "^" '''"'"^ '' ^ ^^^"^^"^^ ^"^^^- So the Cobol group 
04 MM PIC(99). 
04 DD PIC(99). 
04 YY PIC(99). 

.Jd^nave^^rd1c^^?,^;pXS'oTrur ^^^ 

.tsS'^tiMi'^^-r—sSeir™'-^-—-- 

enti. record asast4offyran?rr^:rr4rSr^:t::^^^^ '^="^ *^ 
TTie solution to this is to provide group support in the SQL language. 
Parameters at Compile time 

FROM accounts 

WHERE accouuLnumber BETWEEN :min AND :max- 



SELECT customer 

FROM accounts 

WHERE account_number BETWEEN 50000 AND 60000; 
then the optimizer will guess that 10,000 records will be selected. This disparity 
may result in bad query execution plans. To our knowledge, all relational systems 
suffer from this illness. This problem is serious for SQL programs; almost all 
SQL statements will involve host-language variables rather than literals. It is trou- 
blesome that SQL optimizers are unable to guess set cardinalities in this case. 

There are three proposals (not implemented) to deal with this problem. The first is 
to let the user give the compiler hints. 

The second idea is to execute the program in "training mode" with sample pa- 
rameters, compile the statements on the fly, and save the plans used for the 
presented literals as the plan to be used in production. 

A third scheme is to set a threshold for guesses. If the compiler cannot find a cheap 
plan when host language variables are present, it would simply not compile at static 
compilation time. At run time, the statement would be compiled dynamically on 
first invocation with the Hterals bound in for the host variables. This is similar to 
the first scheme but is more automatic. 

Update Statistics 

The SQL compiler picks execution plans based on estimated table sizes, record 
sizes, mdex selectivity, and for each field, the number of unique values, and the 
nunimum and maximum values. All these indicators are collected by a utility SOL 
command called UPDATE STATISTICS. 

For the 20GB database used in the DebitCredit benchmark, the first implementation 
of this utUity would have taken about a day to collect aU the statistics on the various 
tables. Since NonStop SQL is intended for databases far in excess of 20GB 
somethmg had to be done. UPDATE STATISTICS now samples the database and 
estimates the statistics. Now its maximum running time is a few minutes. 

A complete scan is done for smaU tables (<4MB). For a 20GB table, a 64,000 bit 
hash table is aUocated for each field; and, 32,000 records are examined. The values 
of each record's fields are hashed into its hash table; and, the min and max value of 
each field is maintained. After all these records have been examined, the mmimum 
value, maximum value, and number of unique values are estimated. This algorithm 
is in the spirit of Sample Counting described by [Astrahan-2]. 
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SUMMARY 



* ^^^^^^: ' ^^^^ ^^^^^^-^^ ^-' ^^-^^^-d execution, 

nomwf '^T^"' 'i"'^"" ^^' """' °" '"^^^ (30K$) computers and on large 
(30M$) computers and on many sizes in between. ^ 

A Nonstop SQL - it tolerates any single fault without interrupting 



service. 



The first high-performance SQL - it has been benchmarked at over 200TPS 
with no bottlenecks in sight. 

An SQL with a cost per transaction comparable to the ueUest record-at a tim^ 
high-performance data management systems. at-a-time 

eliv'noSr"' ' " ""*'' " "^^ *^ ''^°™'"'°" -"'- -0 in Production 

^Zh'^!^^}' *'l°'*' "/ *^ ""y* *=" relational systems are inherendv slow 
effilnt cod^"" ''^°""' "' *' "^^ *^' assembly language was Squired fo; 

system dv?stsl°rii^'' semantics and a message-based distributed operating 
system gives a surpnsmg result the message savings of a high-level interfar^mv 

MeT.cT """"" °"^' ''^^ '^"^"^^^ when'compare'd to rl^d-t'tS^ 
Nonstop SQL is the fim SQL system to be integrated with an operating system 

ex.nded to support default'c'^UsT*: au'^lL^z: irmeTJsr^^^^^^^^ 

counter, in addition, the implementation integrated SOL obiects and v^rh. ^th 
the Tandem tmnsaction manager, network and NonStop mechSs 

TOs integration comes at some cost: it is not portable to other machines On the 
other hand it has considerable benefits in usability. simplicityXTrfon^an^e 



38 



REFERENCES 

[Anon] Anon et al., "A Measure of Transaction Processing Power- 

Datamation, V. 31.7, April 1985, pp. 1 12-1 18. 

[ANSI] "Database Language SQL", American National Standard X3.135- 

1986. 

[ANSI-2] "Database Language SQL 2 (ANSI working draft)", ANSI X3H2 87- 
8. December 1986. 

[Astrahan] M. Astrahan et al., "System R: a Relational Approach to Database 
Management", ACM TODS 1.2, June 1986. 

[Astrahan-2] M. Astrahan et al., "Counting Unique Values of an Attribute Without 
Sortmg", IBM RJ 4960, December 1985 

[Bartlett] J. Bartlett, "A NonStop Kemel", Proc 8th ACM SOSP, Dec. 1981. 

[Bitton] D. Bitton et al., "Benchmarking Database Systems: A Systematic 
Approach", Proc. 9th VLDB, Nov 1983. 

[Borr] A. Borr, "Transaction Monitoring in Encompass", VLDB, Sept. 1981. 

[Codd] "^^^^"^^^^ Database: A Practical Foundation for Productivity", 

[Date] An Introduction to Database Systems, Volume 1, Addison Wesley 

April 1986. •" 

[DB2] IBM Database 2 General Information Manual, IBM Fomi No GC 26- 

4073-2, Annonk, NY, Feb. 1986. 

[FastSort] A. Tsukerman et al, " FastSort: An External Sort Using ParaUel 
Processmg , Tandem Technical Report 86.3, Cupertino. CA, May 

[Gawlick] ?:Gawlick "Processing Hot Spots in High Perfoimance Systems", 
Proc. IEEE Compcon, Feb. 1985. 

[Gray] J- Gray, et al. "Granularity of Locks and Degrees of Consistency in a 

Shared Database , Modeling in Database Management Systems, GM 
Nijssen ed., Jan 1976. 



39 



[Informix] Informix SQL Relational Database System, Users Guide, Part No. 
200-41-1015-8, Relational Database Systems Inc., Menlo Park, CA 
June 1986. 

[Ingres] The INGRES Papers: The Anatomy of a Relational Database 
Management System, M. Stonebraker ed., Addison Wesley, May 1985. 

[Lindsay] B. Lindsay, "Object Naming and Catalog Management for a 
Distributed Database Management System", 2nd Int. Conf. on 
Distributed Computer Systems, IEEE, April 1981. 

[Mohan] C, Mohan et al., "Transaction Management in the R* Distributed 
Database Management System", ACM TODS, VI 1.4, Dec. 1986. 

[Measure] Measure User's Guide, Part No. 82440, Tandem Computers Inc, 
Cupertino, CA, Dec. 1986. 

[NonStop SQL] Introduction to NonStop SQL, Part No. 82317, Tandem Computers 
Inc, Cupertino, CA, March 1987. 

[NonStop SQL-1] NonStop SQL Benchmark Workbook, Part No. 84160, Tandem 
Computers Inc, Cupertino, CA, March 1987. 

[Pathway] Introduction to Pathway, Part No. 82339, Tandem Computers Inc, 
Cupertino, CA, June 1985. 

[Pathmaker] Introduction to Pathmaker, Part No. 84070, Tandem Computers Inc, 
Cupertino, CA, March 87. 

[Palermo] F. Palermo, "A Database Search Problem", Information Systems- 
COINS N, J. Tou ed.. Plenum, 1974. 

[Rowe] L. Rowe, "Database Portals: A New Application Programming 
Interface" VLDB, Aug 1984. 

[SafeGuard] SafeGuard User's Manual, Part No. 82539, Tandem Computers Inc, 
Cupertino, CA, Feb. 1987. 

[SQL/DS] SQL/Data System Concepts and Facilities, IBM Form No GH24-50 1 3 
Armonk, NY, Feb. 1982. 

[Teradata] "The Genesis of a Database Computer: A Conversation with Jack 
Shemer and PhU Neches of Teradata Corporation", IEEE Computer, 
V?.?, Nov. 1984. 



40 



[Williams] R. Williams et al., "R*: An Overview of the Architecture" IBM 
Research Report RJ3325, San Jose, CA, Dec 1 98 1 . 



41 



